Xi-Language Reference: Basical Notes

    • Naming Conventions
    • Whitespaces, Comments, Identifiers (What is the part of the Lexer?)

    Naming Conventions

    Internally the Xi interpreter is divided into three main parts: The Lexer, the Parser and the pure Interpreter. The Lexer gets the input from the user or from a file, translates this input into so called tokens (i.e. the basic elements of a programming language) and sends these tokens to the Parser. Now the Parser translates these tokens after given rules into an abstract sequence of expressions and statements that the Interpreter can understand. The Interpreter now does nothing more than pure calculation work by following the statement-sequence from the Parser. To make things clear: Only the Lexer and the Parser define the Xi programming language, the Interpreter is totally unaware of this definition. Think of the Lexer and Parser as a compiler and of the Interpreter as a very clever CPU.

    Whitespaces, Comments, Identifiers (What is the part of the Lexer?)

    As stated above the input from the user first is send through the Lexer that translates this input into tokens. Now not every input is a valid token. For example: A comment shouldn't be a token - the Parser has more important things to do than bothering with junk like this. Furthermore it should make no difference if the user types one or two spaces at one point.

    To specify this: Whitespaces are a sequence of characters that are equivalent to no character - the Lexer eats whitespaces an forgets them at the same moment. In Xi whitespaces (like in C) are spaces, tabulators and newlines. In the following a whitespace may be shortened by <white>.

    A comment is equivalent to a whitespace - the Lexer also eats and forgets them. In Xi a comment (like in C) begins with a /* and ends with a */. It is useless to define an extra abbreviation for comments.

    Now we come to one of the basic tokens of Xi: A number is a sequence of characters that begin with a digit and only contain digits or one of the following: .eExXaAbBcCdDfF. Valid inputs are integer numbers like 123, rational numbers like 123.45, hexadecimal numbers 0x01af or exponential numbers like 1e3 or 2.3e-23. The concept should be clear, the syntax ist equivalent to C. In the following a number may be shortened by <num>.

    The second basic token of Xi are strings. A string is a sequence of characters enclosed by " that should not contain a newline. Unlike C two strings can not automatically be concatenated, e.g. a sequence "Chars" <white> "Chars" is valid in C, but in Xi it would lead to a syntax error (this is one of the many things on Bodo's todo-list :-)). In the following a string may be shortened by <string>.

    Last not least there are the identifiers. The names of functions or variables are identifiers. An identifier is a sequence of characters that begins with a character out of a-z or A-Z or is a _ and contains only characters out of a-z or A-Z or 0-9 (digits) or is a _ or ' (the latter is a necessary but not very good extension to C). So: abc, AB9_5gh, _abc9' and y' are valid identifiers. Not allowed is 9fhs, hjsd-sdhjk or junk like this. In the following an identifier may be shortened by <id>.


    © 1995 by Bodo Junglas, Klaus Spanderen and Fabian Weis
    - Last revised: April 23 1996